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Abstract 

Rewriting-based approaches for answering queries 
over an OWL 2 DL ontology have so far been de- 
veloped mainly for Horn fragments of OWL 2 DL. 
In this paper, we study the possibilities of answer- 
ing queries over non-Horn ontologies using dat- 
alog rewritings. We prove that this is impossi- 
ble in general even for very simple ontology lan- 
guages, andevenif Ptime = NP. Furthermore, we 
present a resolution-based procedure for SHI on- 
tologies that, in case it terminates, produces a data- 
log rewriting of the ontology. We also show that our 

-ay t 

procedure necessarily terminates on DL-Lite^^'J' 
ontologies — an extension of OWL 2 QL with tran- 
sitive roles and Boolean connectives. 



1 Introduction 

Answering conjunctive queries (CQs) over OWL 2 DL on- 
tologies is a computationally hard [Glimm et ah, 2008; Lutz, 
2008], but key problem in many applications. Thus, consid- 
erable effort has been devoted to the development of OWL 2 
DL fragments for which query answering is tractable in data 
complexity, which is measured in the size of the data only. 
Most languages obtained in this way are Horn: ontologies in 
such languages can always be translated into first-order Horn 
clauses. This includes the families of 'lightweight' languages 
such as DL-Lite [Calvanese et al., 2007], £C [Baader et al., 
2005], and DLP [Grosof et al, 2003] that underpin the QL, 
EL, and RL profiles of OWL 2, respectively, as well as more 
expressive languages, such as Wom-ST-LIQ [Hustadt et al., 
2005] and Hom-SUOIQ [Ortiz et al, 201 1]. 

Query answering can sometimes be implemented via query 
rewriting: a rewriting of a query Q w.rt. an ontology T is 
another query Q' that captures all information from T nec- 
essary to answer Q over an arbitrary data set. Unions of 
conjunctive queries (UCQs) and datalog are common target 
languages for query rewriting. They ensure tractability w.rt. 
data complexity, while enabling the reuse of optimised data 
management systems: UCQs can be answered using rela- 
tional databases [Calvanese et al, 2007], and datalog queries 
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can be answered using rule-based systems such as OWLim 
[Bishop et al, 2011] and Oracle's Semantic Data Store [Wu 
et al, 2008]. Query rewriting algorithms have so far been de- 
veloped mainly for Horn fragments of OWL 2 DL, and they 
have been implemented in systems such as QuOnto [Accia- 
rri et al, 2005], Rapid [Chortaras et al, 2011], Presto [Rosati 
and Almatelli, 2010], Quest [Rodriguez-Muro and Calvanese, 
2012], Clipper [Biter et al, 2012], Owlgres [Stocker and 
Smith, 2008], and Requiem [Perez-Urbina et al, 2010]. 

Horn fragments of OWL 2 DL cannot capture disjunctive 
knowledge, such as 'every student is either an undergraduate 
or a graduate' . Such knowledge occurs in practice in ontolo- 
gies such as the NCI Thesaurus and the Foundational Model 
of Anatomy, so these ontologies cannot be processed using 
known rewriting techniques; furthermore, no query answer- 
ing technique we are aware of is tractable w.rt. data com- 
plexity when applied to such ontologies. These limitations 
cannot be easily overcome: query answering in even the basic 
non-Horn language ££U is co-NP-hard w.rt. data complex- 
ity [Krisnadhi and Lutz, 2007], and since answering datalog 
queries is PTiME-complete, it may not be possible to rewrite 
an arbitrary ECU ontology into datalog unless Ptime = NP. 
Furthermore, Lutz and Wolter [2012] showed that tractability 
w.r.t. data complexity cannot be achieved for an arbitrary non- 
Horn ontology Twith 'real' disjunctions: for each such T, a 
query Q exists such that answering Q w.r.t. Tis co-NP-hard. 

The result by Lutz and Wolter [2012], however, depends 
on an interaction between existentially quantified variables 
in Q and disjunctions in T- Motivated by this observation, 
we consider the problem of computing datalog rewritings of 
ground queries (i.e., queries whose answers must map all the 
variables in Q to constants) over non-Horn ontologies. Apart 
from allowing us to overcome the negative result by Lutz and 
Wolter [2012], this also allows us to compute a rewriting of 
7" that can be used to answer an arbitrary ground query. Such 
queries form the basis of SPARQL, which makes our results 
practically relevant. We summarise our results as follows. 

In Section 3, we revisit the limits of datalog rewritability 
for a language as a whole and show that non-rewritability 
of ECU ontologies is independent from any complexity- 
theoretic assumptions. More precisely, we present an ECU 
ontology T for which query answering cannot be decided by 
a family of monotone circuits of polynomial size, which con- 
tradicts the results by Afrati et al [1995], who proved that 



fact entailment in a fixed datalog program can be decided us- 
ing monotone circuits of polynomial size. Thus, instead of 
relying on complexity arguments, we compare the lengths of 
proofs in ECU and datalog and show that the proofs in ECU 
may be considerably longer than the proofs in datalog. 

In Section 4, we present a three-step procedure that takes 
a iS?^T-ontology T and attempts to rewrite T into a datalog 
program. First, we use a novel technique to rewrite T into a 
TBox ilj- without transitivity axioms while preserving entail- 
ment of all ground atoms; this is in contrast to the standard 
techniques (see, e.g., [Hustadt et ai, 2007]), which preserve 
entailments only of unary facts and binary facts with roles not 
having transitive subroles. Second, we use the algorithm by 
Hustadt et al. [2007] to rewrite Q,j- into a disjunctive data- 
log program DD{Q,q-). Third, we adapt the knowledge com- 
pilation technique by del Val [2005] and Selman and Kautz 
[1996] to transform DD{Vlx) into a datalog program. The fi- 
nal step is not guaranteed to terminate in general; however, if 
it terminates, the resulting program is a rewriting of T. 

In Section 4.4, we show that our procedure always termi- 
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nates if T is a DL-Lite^^'^^ -ontology — a practically-relevant 
language that extends OWL 2 QL with transitive roles and 
Boolean connectives. Artale et al. [2009] proved that the data 
complexity of concept queries in this language is tractable 
(i.e., NLOGSPACE-complete). We extend this result to all 
ground queries and thus obtain a goal-oriented rewriting al- 
gorithm that may be suitable for practical use. 

Our technique, as well as most rewriting techniques known 
in the literature, is based on a sound inference system and thus 
produces only strong rewritings — that is, rewritings entailed 
by the original ontology. In Section 5 we show that non-Horn 
ontologies exist that can be rewritten into datalog, but that 
have no strong rewritings. This highlights the limits of tech- 
niques based on sound inferences. It is also surprising since 
all known rewriting techniques for Horn fragments of OWL 
2 DL known to us produce only strong rewritings. 

The proofs of all of our technical results are given in ap- 
pendices A-F. 

2 Preliminaries 

We consider first-order logic without equality and function 
symbols. Variables, terms, (ground) atoms, literals, formu- 
lae, sentences, interpretations / = (A^, ■'), models, and en- 
tailment (|=) are defined as usual. We call a finite set of facts 
(i.e., ground atoms) an ABox. We write (p{x) to stress that a 
first-order formula ip has free variables x = xi, . . . ,Xn- 

Resolution Theorem Proving 

We use the standard notions of (Horn) clauses, substitutions 
(i.e., mappings of variables to terms), and most general uni- 
fiers (MGUs). We often identify a clause with the set of its 
literals. Positive factoring (PF) and binary resolution (BR) 
are as follows, where a is the MGU of atoms A and B: 



PF: 



CW AV B 
CaV Aa 



BR: 



CV A DV^B 
{CyD)a 



A clause C is a tautology if it contains literals A and ^A. 
A clause C subsumes a clause D if a substitution a exists 
such that each literal in Ca occurs in D. Furthermore, C 



0-subsumes D if C subsumes D and C has no more literals 
than D. Finally, C is redundant in a set of clauses 5 if C is a 
tautology or if C is 0-subsumed by another clause in S. 

Datalog and Disjunctive Datalog 

A disjunctive rule r is a function-free first-order sentence of 
the form VxVz. [(p(x, z) — > 'il}{x)], where tuples of variables 
X and z are disjoint, (p(x, z) is a conjunction of atoms, and 
?/;(x) is a disjunction of atoms. Formula (p is the body of r, 
and formula if) is the head of r. For brevity, we often omit 
the quantifiers in a rule. A datalog rule is a disjunctive rule 
where i/'(x) is a single atom. A (disjunctive) datalog program 
"P is a finite set of (disjunctive) datalog rules. Rules obviously 
correspond to clauses, so we sometimes abuse our definitions 
and use these two notions as synonyms. The evaluation of V 
over an ABox A is the set V{A) of facts entailed hy VCi A. 

Ontologies and Description Logics 

A DL signature is a disjoint union of sets of atomic concepts, 
atomic roles, and individuals. A role is an atomic role or 
an inverse role R^ for R an atomic role; furthermore, let 
inv(i?) — R^ and inv(i?^) — R. A concept is an expression 
of the form T, _L, A, ^C, Ci n C2, Ci U C2, 3R.C, \tR.C, 
or 3_R.self, where A is an atomic concept, Cfj) are con- 
cepts, and i? is a role. Concepts Eli?. self correspond to atoms 
i?(x, x) and are typically not included in SHI; however, we 
use this minor extension in Section 4.L A SHI-TBox T, 
often called an ontology, is a finite set of axioms of the form 
^1 !^ i?2 (role inclusion axioms or RIAs), Tra(i?) (transi- 
tivity axioms), and Ci C C2 (general concept inclusions or 
GCIs), where i?(i) are roles and C(i) are concepts. Axiom 
Ci = C2 abbreviates C\ C C2 and C2 ^ C\. Relation IZ:^- is 
the smallest reflexively-transitively closed relation such that 
R^^ S and inv(i?) ^^ \m{S) for each i? C S" e T. A role 
R is transitive in T if Tra(i?) G 7" or Tra(inv(_R)) G T. Sat- 
isfaction of a SHI-TBox T in an interpretation / = ( A^ , •^), 
written / |= T, is defined as usual [Baader et al., 2003]. 

An ACCHI-TBox is a SHI-TBox with no transitivity ax- 
ioms. An ECU-TBox is an ACCHI-TBox with no role in- 
clusion axioms, inverse roles, concepts 3i?.self, or symbols 
±, V, and -.. A DL-Lite^'^j-TBox is a SHI-TBox that does 
not contain concepts of the form \/R.C, and where C = T 
for each concept of the form 3R.C. The notion of acyclic 
TBoxes is defined as usual [Baader et al, 2003]. 

A SHI-TBox T is normalised if V does not occur in T, 
and 3 occurs in T only in axioms of the form 3R.C C A, 
3R.se\f C A, A C 3R.C, or yl C 3i?.self. Each SHI-TBox 
T can be transformed in polynomial time into a normalised 
SHI-TBox that is a model-conservative extension of T. 

Queries and Datalog Rewritings 

A ground query (or just a query) Q{x) is a conjunction of 
function-free atoms. A substitution a mapping a; to constants 
is an answer to Q{x) w.rt. a set T of first-order sentences and 
an ABox ^ if J^ U ^ \= Q{x)a; furthermore, cert(Q, I', A) 
is the set of all answers to Q{x) w.r.t. J^ and A. 

Let Q be a query. A datalog program P is a Q-rewriting of 
a finite set of sentences T if cert(Q, F, A) = cert{Q, V, A) 
for each ABox A. The program T' is a rewriting of T if V 



is a Q-rewriting of T for each query Q. Such rewritings are 
strong if, in addition, we also have F ^V. 

3 The Limits of Datalog Rewritability 

Catalog programs can be evaluated over an ABox A in poly- 
nomial time in the size of A; hence, a co-NP-hard property 
of A cannot be decided by evaluating a fixed datalog pro- 
gram over A unless Ptime = NP. Krisnadhi and Lutz [2007] 
showed that answering ground queries is co-NP-hard in data 
complexity even for acyclic TBoxes expressed in ECU — the 
simplest non-Horn extension of the basic description logic 
SC Thus, under standard complexity-theoretic assumptions, 
an acyclic ££W-TBox and a ground query Q exist for which 
there is no Q-rewriting of T. In this section, we show that 
this holds even if PTIME = NP. 

Theorem 1. An acyclic £CU-TBox T and a ground CQ Q 
exist such that T is not Q-rewritable. 

Our proof uses several notions from circuit complexity 
[Wegener, 1987], and results of this flavour compare the sizes 
of proofs in different formalisms; thus, our result essentially 
says that proofs in EHA can be significantly longer than 
proofs in datalog. Let < be the ordering on Boolean values 
defined by f < t; then, a Boolean function / with n inputs 
is monotone if f{xi, . . . , x„) < /(j/i, . . . , yn) holds for all 
n-tuples of Boolean values xi, . . . , x„ and yi, . . . , j/„ such 
that Xi < yi for each 1 < i < n. A decision problem can be 
seen as a family of Boolean functions {/„}, where /„ decides 
membership of each n-bit input. If each function /„ is mono- 
tone, then /„ can be realised by a monotone Boolean circuit 
Cn (i.e., a circuit with n input gates where all internal gates 
are AND- or OR-gates with unrestricted fan-in); the size of 
Cn is the number of its edges. The family of circuits {C„} 
corresponding to {/„} has polynomial size if a polynomial 
p{x) exists such that the size of each C„ is bounded by p{n). 

We recall how non-3-colorability of an undirected graph G 
with s vertices corresponds to monotone Boolean functions. 
The maximum number of edges in G is m{s) = s{s — l)/2, 
so graph G is encoded as a string x of m{s) bits, where bit 
Xij, 1 < i < j < s, is t if and only if G contains an edge be- 
tween vertices i and j. The non-3-colorability problem can 
then be seen as a family of Boolean functions {fm{s)}, where 
function fm{s) handles all graphs with s vertices and it eval- 
uates to t on an input x iff the graph corresponding to x is 
non-3-colourable. Functions /„ such that n ^ m{s) for all s 
are irrelevant since no graph is encoded using that many bits. 

We prove our claim using a result by Afrati et al. [1995]: if 
a decision problem cannot be solved using a family of mono- 
tone circuits of polynomial size, then the problem also cannot 
be solved by evaluating a fixed datalog program, regardless of 
the problem's complexity. We restate the result as follows. 

Theorem 2. [Adapted from Afrati ei 2\. 1995] 

1. Let V be a fixed datalog program, and let a be a fixed 
fact. Then, for an ABox A deciding V ^J A\= a can be 
solved by monotone circuits of polynomial size. 

2. The non-3 -colorability problem cannot be solved by 
monotone circuits of polynomial size. 



Table 1 : Example TBox 7^x 
~i Student C GrSt U UnGrSt 
72 Course C GrCo U UnGrCo 

73 



PhDSt C 3takes.PhDCo 



74 PhDCo C GrCo 

75 Btakes.GrCo C GrSt 

76 UnGrSt n Btakes.GrCo C L 

To prove Theorem 1, we present a TBox T and a ground 
CQ Q that decide non-3-colorability of a graph encoded as an 
ABox. Next, we present a family of monotone Boolean func- 
tions {gn{u) } that decide answering Q w.r.t. T an arbitrary 
ABox A. Next, we show that a monotone circuit for arbi- 
trary fm(s) can be obtained by a size-preserving transforma- 
tion from a circuit for some .g„(„); thus, by Item 2 of Theorem 
2, answering Q w.r.t. T cannot be solved using monotone cir- 
cuits of polynomial size. Finally, we show that existence of a 
rewriting for Q and T contradicts Item 1 of Theorem 2. 

4 Computing Rewritings via Resolution 

Theorem 1 is rather discouraging since it applies to one of the 
simplest non-Horn languages. The theorem's proof, however, 
relies on a specific TBox T that encodes a hard problem (i.e., 
non-3-colorability) that is not solvable by monotone circuits 
of polynomial size. One can expect that non-Horn TBoxes 
used in practice do not encode such hard problems, and so it 
might be possible to rewrite such TBoxes into datalog. 

We illustrate this intuition using the TBox 7^x shown in 
Table 1 . Axioms 74-75 correspond to datalog rules, whereas 
axioms 71-73 represent disjunctive and existentially quanti- 
fied knowledge and thus do not correspond to datalog rules. 
We will show that 7^x can, in fact, be rewritten into data- 
log using a generic three-step method that takes a normalised 
iS'HI-TBox T and proceeds as follows. 

51 Eliminate the transitivity axioms from T by transform- 
ing T into an ACCTiX-T'Rox Q,j- and a set of data- 
log rules S7- such that facts entailed by T U ^ and 
^j- U 'E,x{A) coincide for each ABox A. This step ex- 
tends the known technique to make it complete for facts 
with roles that have transitive subroles in T. 

52 Apply the algorithm by Hustadt et al. [2007] to trans- 
form O7- into a disjunctive datalog program DD(i77-). 

53 Transform DD{Q,q-) into a set of datalog rules Vh using 
a variant of the knowledge compilation techniques by 
Selman and Kautz [1996] and del Val [2005]. 

Step S3 may not terminate for an arbitrary <SHI-TBox 7"; 
however, if it terminates (i.e., if Vh is finite), then Vh U S7- 
is a rewriting of T. Furthermore, in Section 4.4 we show that 
step S3 always terminates if 7 is a DL-Lite^^'J'-TBox. We 
thus obtain what is, to the best of our knowledge, the first 
goal-oriented rewriting algorithm for a practically-relevant 
non-Horn fragment of OWL 2 DL. 

4.1 Transitivity 

We first recapitulate the standard technique for eliminating 
transitivity axioms from iST^I-TBoxes. 



Definition 3. Let T be a normalised SHI-TBox, and let Qj- 
be obtained from T by removing all transitivity axioms. IfT 
is a DL-Litef^^^i -TBox, then let T7- = 07-; otherwise, let T7- 
be the extension ofOj- with axioms 



3R.A C Cb. 



R 



3R.Cb,r !^ Cb,r Cb.r E B 



for each axiom 3S.A \Z B Cz T and each transitive role R in 
T such that R C^ S, where C'b,r is afresh atomic concept 
unique for B and R. 

This encoding preserves entailment of all facts of the form 
C{c) and t/(c, d) if U has no transitive subroles: this was 
proved by Artale et al. [2009] for DL-Lite^'^l, and by Siman- 
cik [2012] for SHI. Example 4, however, shows that the 
encoding is incomplete if U has transitive subroles. 

Example 4. Let T be the TBox below, and let A = {A(a)}. 



A C 3S.B 



S\ZR S\ZR- 



JraiR) 



Then, T7- = T\ {Tra(i?)}, and one can easily verify that 
TU A\== R{a, a), but TrU A^ R{a, a). Note, however, 
that the missing infrrence can be recovered by extending T7- 
with the axiom A \— Eli?. self, which is a consequence ofT. 

The intuitions from Example 4 are formalised in Defini- 
tion 5. Roughly speaking, we transform the transitivity and 
role inclusion axioms in T into a datalog program S7-, which 
we apply to A 'first' — that is, we compute 'E.--f{A) indepen- 
dently from any GCIs. To recoup the remaining consequences 
of the form R{a, a), we extend T7- with sufficiently many ax- 
ioms of the form A C 3i?.self that are entailed by T; this is 
possible since we assume that T is normalised. 

Definition 5. Let T be a normalised ST-LT-TBox. Then, 
^j- is the TBox obtained by extending T7- with an axiom 
A \— 3R.se\^ for each atomic concept A and each atomic role 
R such that R is transitive in T, and A [I 3S.B G 1^ for some 
concept B and role S with S Q^ R and S Q^ R~ . Further- 
more, S7- is the set of datalog rules corresponding to the role 
inclusion and transitivity axioms in T. 

Theorem 6. Let T be a normalised SHX-TBox, let A be an 
ABox, and let a be a fact. Then, T U A [^ a if and only if 

nr u 'E.riA) h a- 

Note that, if T is normalised, so is Vtq-. Furthermore, to 
ensure decidability, roles involving transitive subroles are not 
allowed occur in T in number restrictions, and so Theorem 6 
holds even if T is a SUOIQ-TBox. 

4.2 From DLs to Disjunctive Datalog 

Step S2 of our rewriting algorithm uses the technique by Hus- 
tadt et al. [2007] for transforming an ACC'HI-I'&ox T into a 
disjunctive datalog program DD(T) such that, for each ABox 
A, the facts entailed by TUyl and DD(T) yj A coincide. 
By eliminating the existential quantifiers in 7", one thus re- 
duces a reasoning problem in T U ^ to a reasoning prob- 
lem in DD(T) U A. The following definition summarises the 
properties of the programs produced by the transformation. 

Definition 7. A disjunctive datalog program V is nearly- 
monadic ;/ its rules can be partitioned into two disjoint sets, 
"P™ and V^ , such that 



Table 2: Example Disjunctive Program DD(7ix) 

~Cx -■Student(a;) V GrSt(a;) V UnGrSt(a;) 

Ci -iCourse(x) V GrCo(x) V UnGrCo(a;) 

C'3 ^PhDSt(a;) V GrSt(a;) 

~C^ -nPhDCo(a;) V GrCo(x) 

C5 ^takes(a;, y) V -.GrCo(j/) V GrSt(a;) 

Ce ^UnGrSt(a;) V -.takes(x, y) V -•GrCo(y) 

1. each rule r € "P™ mentions only unary and binary pred- 
icates and each atom in the head ofr is of the form A{z) 
or R{z, z)for some variable z, and 

2. each rule r £ 7"" is of the form R{x, y) — > S{x, y) or 
R{x,y) -> S{y,x). 

A disjunctive rule r is simple if there exists a variable x 
such that each atom in the body of r is of the form Ai [x), 
Ri{x, x), Si{x, yi), or Tiijji, x), each atom in the head ofr 
is of the form Ui{x, x) or Bi(x), and each variable yi occurs 
in r at most once. Furthermore, a nearly -monadic program 
V is simple if each rule in V"^ is simple. 

Theorem 8 follows mainly from the results by Hustadt et 
al. [2007]; we just argue that concepts 3i?.self do not affect 
the algorithm, and that DD(T) satisfies property 1. 

Theorems. For T a normalised ACCHT-TBox, DD (T) sat- 
isfies the following: 

7. program DD(T) is nearly-monadic; furthermore, ifTis 
a DL-Litej^^^i -TBox, then DD(T) is also simple; 

2. r\^ DD(r); and 

3. cert(Q, T, A) = cert(0, DD{T),A) for each ABox A 
and each ground query Q. 

Example 9. When applied to the TBox Tex in Table 1, this 
algorithm produces the disjunctive program DD(7^x) shown 
(as clauses) in Table 2. In particular, axiom 73 is eliminated 
since it contains an existential quantifier, but its effects are 
compensated by clause C3. Clauses C1-C2 and C/^-Cq are 
obtained from axioms 71-72 and 74-75, respectively. 

4.3 From Disjunctive Datalog to Datalog 

Step S3 of our rewriting algorithm attempts to transform the 
disjunctive program obtained in Step S2 into a datalog pro- 
gram such that, for each ABox A, the two programs entail the 
same facts. This is achieved using known knowledge compi- 
lation techniques, which we survey next. 

Resolution-Based Knowledge Compilation 

In their seminal paper, Selman and Kautz [1996] proposed an 
algorithm for compiling a set of propositional clauses S into 
a set of Horn clauses Sh such that the Horn consequences 
of S and Sh coincide. Subsequently, del Val [2005] gener- 
alised this algorithm to the case when S contains first-order 
clauses, but without any termination guarantees; Procedure 1 
paraphrases this algorithm. The algorithm applies to S bi- 
nary resolution and positive factoring from resolution theo- 
rem proving, and it keeps only the consequences that are not 
redundant according to Definition 10. Unlike standard reso- 
lution, the algorithm maintains two sets Sh and S-^ of Horn 



Procedure 1 Compile-Horn 



Input: S: set of clauses 
Output: Sh' set of Horn clauses 

1: Sh '■= {C G iS I C is a Horn clause and not a tautology} 

2: Sjj := {C £ iS I C is a non-Horn clause and not a tautology} 

3: repeat 

4: Compute all relevant consequences of {Sh,Sjj) 

5: for each relevant consequence C of {Sh,S-jj) do 

6: 

7: 



Delete from Sh and Sjj- all clauses 6-subsumed by C 



If C is Horn then Sh := Sh U {C} 
8: else Stj--StjU{C} 

9: until there is no relevant consequence of {Sh, Sjf) 
10: return Sh 



and non-Horn clauses, respectively; furthermore, the algo- 
rithm never resolves two Horn clauses. 

Definition 10. Let Sh and Sjj be sets of Horn and non-Horn 
clauses, respectively. A clause C is a relevant consequence of 

{Sh,Sjj-) if 

• C is not redundant in Sh U Sjf, and 

• C is a factor of a clause Ci G Sjj, or a resolvent of 
clauses C'l G Sjf and C'2 G Sjf U Sh- 

Theorem 11 recapitulates the algorithm's properties. It es- 
sentially shows that, even if the algorithm never terminates, 
each Horn consequence of S will at some point during algo- 
rithm's execution become entailed by the set of Horn clauses 
Sh computed by the algorithm. The theorem was proved by 
showing that each resolution proof of a consequence of S can 
be transformed to 'postpone' all resolution steps between two 
Horn clauses until the end; thus, one can 'precompute' set Sh 
of all consequences of S derivable using a non-Horn clause. 

Tlieorem 11. ([del Val, 2005]) Let S be a set of clauses, and 
let C be a Horn clause such that S \^ C, and assume that 
Procedure 1 is applied to S. Then, after some finite number 
of iterations of the loop in lines 3-9, we have Sh H C. 

ABox-Independent Compilation 

Compiling knowledge into Horn clauses and computing data- 
log rewritings are similar in spirit: both transform one theory 
into another while ensuring that the two theories are indistin- 
guishable w.r.t. a certain class of queries. There is, however, 
an important difference: given a disjunctive program V and a 
fixed ABox A, one could apply Procedure 1 to 5 = P U .A to 
obtain a datalog program Sh, but such Sh would not neces- 
sarily be independent from the specific ABox A. In contrast, 
a rewriting of 7-" is a datalog program Vh that can be freely 
combined with an arbitrary ABox A. We next show that a 
program Vh satisfying the latter requirement can be obtained 
by applying Procedure 1 to T' only. 

Towards this goal, we generalise Theorem 11 and show 
that, when applied to an arbitrary set of first-order clauses Af, 
Procedure 1 computes a set of Horn clauses Afn such that the 
Horn consequences of Af L) A and Mh U A coincide for an 
arbitrary ABox A. Intuitively, this shows that, when Proce- 
dure 1 is applied toS = Af L) A, all inferences involving facts 
in A can be 'moved' to end of derivations. 



Tlieorem 12. Let Af be a set of clauses, let A be an ABox, let 
C be a Horn clause such that Af '^ A\= C, and assume that 
Procedure 1 is applied to Af. Then, after some finite number 
of iterations of the loop in lines 3-9, we have Afn U ^ ^ C. 

Rewriting Nearly-Monadic Disjunctive Programs 

The final obstacle to obtaining a datalog rewriting of a SHI- 
TBox T is due to Theorem 6: the rules in S7- should be ap- 
plied 'before' ^7-. While this allows us to transform O7- into 
V ~ DD(il7-) and Vh without taking S7- into account, this 
also means that Theorems 6, 8, and 12 only imply that the 
facts entailed by T U ^ and Vh U 'E.-j-{A) coincide. To ob- 
tain a 'true' rewriting, we show in Lemma 13 that program 
Vh is nearly-monadic. We use this observation in Theorem 
14 to show that each binary fact obtained by applying Vh to 
'E.-T-{A) is of the form R{c, c), and so it cannot 'fire' the rules 
in S7-; hence, Vh U S7- is a rewriting of T. 

Lemma 13. Let V be a nearly-monadic program, and as- 
sume that Procedure 1 terminates when applied to V and re- 
turns Vh- Then, Vh is a nearly-monadic datalog program. 

Theorem 14. Let V = DD(fir) far T an SHI-TBox. If 
when applied to V, Procedure 1 terminates and returns Vh, 
then Vh U 'B.j- is a rewriting ofT. 

Please note that our algorithm (just like all rewriting algo- 
rithms we are aware of) computes rewritings using a sound 
inference system and thus always produces strong rewritings. 

Example 15. When applied to the program V — DD(7^x) 
from Table 2, Procedure 1 resolves C2 and C5 to derive (I), 
C2 and Cg to derive (2), and Ci and Cg to derive (3). 

-.takes(a;, y) V -.Course(jy) V GrSt(a;) V UnGrCo(j/) (1) 

-.takes(a;, y) V -.UnGrSt(a::) V -.Course(y) V UnGrCo(j/) (2) 

-.takes(a;, y) V -.Student(x) V ^GrCo(j/) V GrSt(a::) (3) 

Resolving (2) and Ci, and (3) and C2 produces redundant 
clauses, after which the procedure terminates and returns the 
set Vh consisting of clauses C^-Cq, (2), and (3). By Theorem 
14, Vh is o strong rewriting ofTex- 

4,4 Termination 

Procedure 1 is not a semi-decision procedure for either strong 
non-rewritability (cf. Example 16) or strong rewritability (cf. 
Example 17) of nearly-monadic programs. 

Example 16. Let V be defined as follows. 



G{x) V B{x) 
B{xi) V -^E{xi,xa) V -'G(xo) 
G(xi) y ^E{xi,xa) y ^B{xo) 



(4) 
(5) 
(6) 



Clauses (5) and (6) are mutually recursive, but they are also 
Horn, so Procedure 1 never resolves them directly. 

Clauses (5) and (6), however, can interact through clause 
(4). Resolving (4) and (5) on -^G{xq) produces (7); and re- 
solving (6) and (7) on B{xi) produces (8). By further resolv- 
ing (8) alternatively with (5) and (6), we obtain (9) for each 
even n. By resolving (6) and (9) on B{xq), we obtain (10). 
Finally, by factoring (10), we obtain (II) for each even n. 

B{xi)\/^E{xi,xo)\/B{xo) (7) 



G{x2) V ^E{x2, xi) V ^E{xi,xa) V B{xo) (8) 

n 

G{xn)y[\J ^E{x,,x,-i)]y B{xo) (9) 
G(x„) V [V -S(x„x,_i)] V G(x;) V -S(x'i,xo) (10) 

n 

G(x„) V -^(x„, xo) V [ Y -£;(x,, x,_i)] (1 1) 

Procedure 1 thus derives on V an infinite set of Horn clauses, 
and Theorem 22 shows that no strong rewriting ofV exists. 

Example 17. Let V he defined as follows. 

Bi{xo) y B2{xo) V ^A{xo) (12) 

A{xi)y^E{xi,xo)y^Bi{xn) (13) 

A{xi)\/^E{xi,xo)y^B2{xn) (14) 

When applied to V, Procedure 1 will eventually compute in- 
finitely many clauses Cn of the following form: 

n 

Cn = A{Xn) V [ Y ^E{x,,x,^i)] V -A(xo) 
1=1 
However, for each n > 1, clause C„ is a logical consequence 
of clause Ci, so the program consisting of clauses (12), (13), 
and Ci is a strong rewriting ofV. 

Example 18 demonstrates another problem that can arise 
even if V is nearly-monadic and simple. 

Example 18. Let V be the following program: 

-^R{x,y)VA{x) (15) 

^R{x,y)yB{x) (16) 

-nA{x) V -^B{x) V C{x) V D{x) (17) 

Now resolving (15) and (17) produces (18); and resolving 
(16) and (18) produces (19). 

-^R{x,y)W^B{x)\'C{x)\/ D{x) (18) 

^R{x,yi)y^R{x,y2)'VC{x)yD{x) (19) 

Clause (19) contains more variables than clauses (15) and 
(16), which makes bounding the clause size difficult. 

Notwithstanding Example 18, we believe one can prove 
that Procedure 1 terminates if V is nearly-monadic and sim- 
ple. However, apart from making the termination proof more 
involved, deriving clauses such as (19) is clearly inefficient. 
We therefore extend Procedure 1 with the condensation sim- 
plification rule, which eliminates redundant literals in clauses 
such as (19). A condensation of a clause C is a clause D with 
the least number of literals such that D C C and C subsumes 
D. A condensation of C is unique up to variable renaming, 
so we usually speak of the condensation of C. We next show 
that Theorems 1 1 and 12 hold even with condensation. 

Lemma 19. Theorems 11 and 12 hold if Procedure 1 is mod- 
ified so that, after line 5, C is replaced with its condensation. 

One can prove that all relevant consequences of nearly- 
monadic and simple clauses are also nearly-monadic and sim- 
ple, so by using condensation to remove redundant literals, 
we obtain Lemma 20, which clearly implies Theorem 21. 



Lemma 20. If used with condensation. Procedure 1 termi- 
nates when applied to a simple nearly-monadic program V. 

Theorem 21. Let V = DD(f^r) for T a DL-Lite^^^^-TBox. 
Procedure 1 with condensation terminates when applied to V 
and returns V n; furthermore, Vh U S7- is a rewriting ofT. 

We thus obtain a tractable (w.rt. data complexity) proce- 
dure for answering queries over DL-Lite^^'^j -TBoxes. Fur- 
thermore, given a ground query Q and a nearly-monadic and 
simple program Vh obtained by Theorem 21, it should be 
possible to match the NLogSpace lower complexity bound 
by Artale et al. [2009] as follows. First, one should ap- 
ply backward chaining to Q and Vh to compute a UCQ Q' 
such that cert(Q,P//,Sr(»4)) = cert((5',0, Sr(»4)); since 
all nearly-monadic rules in Vh are simple, it should be possi- 
ble to show that such 'unfolding' always terminates. Second, 
one should transform S7- into an equivalent piecewise-linear 
datalog program S^. Although these transformations should 
be relatively straightforward, a formal proof would require 
additional machinery and is thus left for future work. 

5 Limits to Strong Rewritability 

We next show that strong rewritings may not exist for rather 
simple non-Horn ££Z//-TBoxes that are rewritable in general. 
This is interesting because it shows that an algorithm capable 
of rewriting a larger class of TBoxes necessarily must depart 
from the common approaches based on sound inferences. 

Theorem 22. The £CU-TBox f corresponding to the pro- 
gram V from Example 16 and the ground CQ Q — G{xi) are 
Q-rewritable, but not strongly Q-rewritable. 

The proof of Theorem 22 proceeds as follows. First, we 
show that, for each ABox A encoding a directed graph, we 
have cert(Q, T, »4) 7^ iff the graph contains a pair of ver- 
tices reachable by both an even and an odd number of edges. 
Second, we show that latter property can be decided using a 
datalog program that uses new relations not occurring in 7". 
Third, we construct an infinite set of rules TZ entailed by each 
strong rewriting of T. Fourth, we show that TV ^^TZ holds 
for each finite datalog program V! such that T \^'n' . 

Since our procedure from Section 4 produces only strong 
rewritings, it cannot terminate on a TBox that has no strong 
rewritings. This is illustrated in Example 16, which shows 
that Procedure 1 does not terminate when applied to (the 
clausification of) the TBox from Theorem 22. 

6 Outlook 

Our work opens many possibilities for future research. On 
the theoretical side, we will investigate whether one can de- 
cide existence of a strong rewriting for a given 5HI-TBox 
T, and to modify Procedure 1 so that termination is guaran- 
teed. Bienvenue ef flZ. [2013] recently showed that rewritabil- 
ity of unary ground queries over ^£C-TBoxes is decidable; 
however, their result does not consider strong rewritability or 
binary ground queries. On the practical side, we will inves- 
tigate whether Procedure 1 can be modified to use ordered 
resolution instead of unrestricted resolution. We will also im- 
plement our technique and evaluate its applicability. 
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A Proofs for Section 3 

Before presenting the proof of Theorem 1, we recapitulate the definition of monotone polynomial projections, which are fre- 
quently used to transfer bounds on the circuit size from one family of monotone Boolean functions to another Let / be a 
monotone Boolean function with inputs x, and let g be a monotone Boolean function with inputs y. Then, / is a monotone 
projection of g if a mapping p : y ^ {f , t} U x exists such that f{x) = g{p{y)) for each value of x. Given such a mapping p, 
a monotone circuit that computes g{y) can be transformed to a monotone circuit that computes f{x) by replacing each input 
Vi G y with p{yi). Furthermore, a family of Boolean functions {/«} is z. polynomial monotone projection of a family {gk] if a 
polynomial p{n) exists such that each /„ is a monotone projection of some gk with k < p(n); if that is the case and the family 
of functions {gk} can be realised by a family of monotone circuits of polynomial size, then so can {/„}. 

Theorem 1. An acyclic ECU-TBox T and a ground CQ Q exist such that T is not Q-rewritable. 

Proof. Let T be the following acyclic £/Z/-TBox: 

FR = Rn 3edge.i? Fb = B H 3edge.B 

Fg = G n Bedge.G F = FrUFbUFg 

V QRUGUB NC = 3venex.F 

Furthermore, let u be a fixed individual, and let Q — NC{v). We next represent the problem of answering Q over T and an 
arbitrary input ABox A using a family of monotone functions {(/„(«) }. The input size of A is the number u of individuals 
occurring in A different from the fixed individual v; we assume that these individuals are labelled oi, . . . , Uu- Furthermore, 
to unify the notation, let a„+i ~ v. Using the signature of T, one can then construct at most n{u) — 2{u + 1)^ + 9{u + 1) 
assertions; hence, we encode A using n{u) bits y^ ^'^, y™!""', and yf as follows: 

• for each R G {edge, vertex} and 1 < i,j < u, bit j/f- is t if and only if R{ai, Uj) G A; and 

• for each A G {R, G, B, Fr, Fb,Fg, F, V, NC}, bit yf is t if and only if A{a^) G A. 

The family of Boolean functions {(?„(«) } is defined such that, given a vector of bits y encoding an ABox A of input size u, we 
have (?„(„) (y) = t if and only ifTuA\^Q. Since first-order logic is monotonic, each gn(u) is clearly monotone. 

Let {fm(s) } be the family of monotone Boolean functions associated with non-3-colorability as defined in Section 3. We next 
show that {fm(s) } is a monotone polynomial projection of {g„(„) }. To this end, we first show that, for each positive integer s, 
function fm{s) is a monotone projection of .g„(s). Let p be the following mapping, where A is a placeholder for each concept 
from the signature of T different from V: 

n(i,'^^') ^ n(v'^^') = /""''^ for 1 < ^ < J < s 
IAy^,J ) P\yJ,^ ) W otherwise 



, .vertex \ 



piVif = 



t for i = s + 1 and I < j < s 
f otherwise 



t for 1 < i < s 
f for i = s + 1 
p{yi)'^ = f for 1 < i < s + 1 



We now show that fm{s) (2?) = 5n(s) {p{v)) f™" each vector x of m{s) bits. To this end, let G be the undirected graph associated 
with X containing nodes 1, . . . , s. It is straightforward to check that p{y) is then a vector of n{s) bits that encodes the ABox 
Ag with individuals ai, . . . , Og, ag+i ~ v containing the following assertions: 

• edge(ai, aj) and edge(aj , a^) for all 1 < z < j < s such that G contains an edge between i and j, and 



• 



assertions V{ai) and vertex(i;, a^) for each 1 <i < s. 



Furthermore, it is routine to check that G is non-3-colorable iff T U Ag \= Q', but then, by the definition of fm(s) and g„(s), 
we have fm{s){^) = 9n(s){p{y))^ as required. Finally, for p{z) — z^, we clearly have n{s) < {m(s))'^. Thus, the family of 
monotone functions {fm(s) } is a monotone polynomial projection of the family of monotone functions {.g„(s) }• 

The above observation. Item 2 of Theorem 2, and the properties of monotone polynomial projections imply that the query 
answering problem for Q and T cannot be solved using monotone circuits of polynomial size. Now assume that a datalog 
program V exists that is a Q-rewriting of T. By Item 1 of Theorem 2, answering Q over P, and so the problem of answering 
Q over T as well, can be solved using monotone circuits of polynomial size, which is a contradiction. D 



B Proof of Theorem 6 

Theorem 6. Let T be a normalised ST-LI-TBox, let A be an ABox, and let a be a fact. Then, T [J A\= a if and only if 

ilr U Er{A) h a- 

Proof. We prove the contrapositive: for each fact a, we have T U A^ aif and only if flj- U 'E.-j-{A) ^ a. 

(=>) It is routine to show that T7- is a model-conservative extension of T [Simancik, 2012]. Furthermore, for all con- 
cepts A and B, each atomic role R, and each role S such that A C 3S.B E T, S C^ R, and S C^ i?^, we clearly have 
T \= A \I- 3i?.self . By these two properties, ^j- is a model-conservative extension of T. Finally, it is obvious that T \~ S7-. 
Now consider an arbitrary fact a such that TU A^ a. Then, an interpretation / exists such that / \— TU A and I ^ a. Since 
17 7- is a model-conservative extension of T and T |= S7-, an interpretation J exists such that J \^ flj- and J ^ S7- (^) ; further- 
more, since a does not use the symbols occurring in ilj- but not in T, we also have J ^ a. Thus, we have ftj- U Ej-(A) ^ a, 
as required. 

(^) Consider an arbitrary fact a such that f^T- U 'E.-j-(A) ^ a. Then, an interpretation / = (A^, •^) exists such that 
/ 1= rij- U 'E.-j-{A) and / ^ a. Without loss of generality, we can assume that / is of a special tree shape, which we de- 
scribe next. Let 7V4 be the set of individuals occurring in A, and let N be the smallest set such that iV^ C N and, if m G TV, 
then u.i £ N for each nonnegative integer i. Then, we can assume that / satisfies all of the following properties: 

1. A^ C N; 

2. c^ — c for each individual c e Nx, 

3. for each atomic role R, each pair in R^ is of the form (s, s.i), {s.i, s), or (a, b) for s £ N and a, 6 G -/V4; 

4. for each pair (c, d) G R^ such that c, d e iV^, we have c = d or R(c, d) E Ej-(A); and 

5. for each atomic role R, each individual c e A^^ and each c.i E N, if {(c, c.i), {c.i, c)} C i?-^, then there exist concepts A 
and B and a role S such that ^ C 3S.B eT,SC!^ R,S Cf iJ", and c E A^ . 

A model / of Q,j- satisfying properties (l)-(3) can be obtained, for example, using the hypertableau calculus by Motik et al. 
[2009]. Furthermore, if translated into first-order logic, all role atoms in the consequent of an axiom in Vtj- are of the form 
R{x, x), or they occur in formulae of the form 3y.R{x, y) A . . .; thus, the hypertableau calculus cannot derive an atom of the 
form R{a, b) with a ^ b, thus ensuring property (4). Finally, since ilj- is normalised, concepts of the form 3S.B occur in flj- 
only in axioms of the form A C 3S.B; but then, the hypertableau calculus ensures that (c, c.i) E S^ or {c.i, c) G 5*^ only if 
c G A^ ; consequently, the only way for {(c, c.i), {c.i, c)} C R^ to hold is if property (5) holds. 

To complete the proof, we next construct an interpretation J and show that J \^ TU A and J ^ a. In particular, let J be 
the following interpretation defined inductively on the quasi-ordering corresponding to relation C^: 

• A"' = A^; 

• c^ = c^ = c for each individual c G A^^; 

• A"^ = A^ for each atomic concept A; 

• R"^ is the transitive closure of R^ for each atomic role R that is transitive in T; and 

• R' — R^ U U 5'"' for each atomic role R that is not transitive in T. 

SCfi? and i?g:^S 

If T does not contain concepts of the form 3i?.self , then J \^ T follows from the standard proofs of transitivity elimination 
in SHI [Simancik, 2012] and DL-Lite^^'^j [Artale et al, 2009]; furthermore, it is easy to see that the presence of atoms 3i?.self 
requires only minor changes to these proofs. Furthermore, since A C 'E.-j-{A), we clearly have J \^ A. 

We are left to show that J ^ a. If a is of the form A(c), the claim follows from the proofs by Simancik [2012] and Artale et 
al. [2009]. Hence, assume that a is of the form a ~ T{c, d), and assume for the sake of contradiction that J ^ T{c, d). Then, 
by the definition of J, there exist an atomic role R and {mq, ui, . . . , u„} C A^ such that R is transitive in T, R ^^ T, c = uq, 
d — Un, and {ui-i,Ui) G R^ for each 1 <i <n. We consider the following two cases. 

• Assume that, for each < i < n, if w^ G iV^, then Ui = c. Then, we clearly have c = d. Since / satisfies property (3), 
some 1 < i < n exists such that Ui is of the form c.j for some j and {(c, c.j), {c.j, c)} Q R^ holds. Furthermore, since 
/ satisfies property (5), concepts A and B and a role S exist such that A C 3S.B €T, S C^ R, S c:^ R^, and c G A^ . 
By Definition 5, then A C Eli?. self G O7-, which implies (c, c) G R' . Finally, R C^ T impUes R C*^^ T; hence, we have 
(c, c) G T^ as well, which contradicts our assumption that I ^ a. 

• Assume that some 1 < i < n exists such that Ui G Nj, and Ui ^ c. We eliminate from the sequence mq, ui, . . . , «„ each 
subsequence Ui+i, . . . ,Uj with < i < j < n such that Ui G Nj[, Uj G Nj,, and Uk E N \ Nj, for each i < k < j; let 
vq, . . . ,vghe the resulting sequence. Since / satisfies property (3), each eliminated subsequence satisfies Ui = uf, hence, 
for each 1 < i < ^, we have (wi-i, Vi) G R^ . Furthermore, since Ui exists such that Ui E Nj^ and Ui ^ c, we have £ > 1, 



vq = c, and Vi — d. Finally, note that the above definition eliminates each subsequence Ui,Ui^i such that Ui = Ui+i 
(condition Uk E N \ iV^ for each i < k < j is then vacuously satisfied); therefore, sequence vq, . . . ,Vi consists of distinct 
individuals in N_a- But then, since / satisfies property (4), we have that R{vi-i,Vi) e 'E.-j-{A) for each 1 < i < £. Finally, 
by the definition of S7-, then 'E.-j-{A) contains R{vo,Vi) = R(c,d), and consequently T{c,d) G Ej-{A) as well. This, 
however, contradicts our assumption that I ^ a. D 

C Proofs for Section 4.2 

Theorem 8. For T a normalised ACCHX-TBox, DD(7~) satisfies the following: 

1. program DD(7~) is nearly-monadic; furthermore, ifT is a DL-Lite^^^^ -TBox, then 00(7") is also simple; 

2. T"^ DD(r); and 

3. cert(Q,T,A) = cert(Q,DD(T'), A) for each ABox A and each ground query Q. 

Sketch. The algorithm by Hustadt et al. [2007] first translates T into a set of skolemised clauses. An inspection of the algorithm 
reveals that, without concepts of the form 3i?.self, each resulting clause is of one of the following forms, where R is an atomic 
role, / is a function symbol, and ^(i), -B(i), C'(i), and D(i) are atomic concepts, T, or ±: 

^Aix)W R{x,f{x)) (20) 

^A{x)W R{f{x),x) (21) 

^Rix,y)\/S{x,y) (22) 



^Rix,y)VS{y,x) (23) 



-^A{x) V -^R{x, y) V ^B{y) V C{x) V D{y) (24) 

y ^A,{x)y\/ ^B,{f{x)) y\J C^{x)\j\J D,{f{x)) (25) 

\J ^A,{x) y\/ C^{x) (26) 

Furthermore, since T is normalised, axioms with concepts of the form 3_R.self are translated into clauses of the following form: 

-^ R{x,x) V A{x) (27) 

-^A{x) V R{x,x) (28) 

The algorithm next saturates the resulting set of clauses by ordered resolution, which is parameterised by a carefully con- 
structed literal ordering and selection function; these parameters ensures that binary resolution and positive factoring are per- 
formed only with literals that are underlined in (20)-(28). The selection function can be extended to select atom R{x, x) in 
each clause of type (27); furthermore, the ordering can be modified so that each atom R{x^ x) is larger than all atoms A{x), 
thus ensuring that only atom R{x, x) participates in inferences with clauses of type (28). Hustadt et al. [2007] show that each 
binary resolution or positive factoring inference, when applied to clauses of type (20)-(26), produces a clause of type (20)-(23) 
or (25)-(26). This is easily extended to clauses of type (27)-(28): 

• a clause of type (27) cannot be resolved with any other clause; 

• resolving a clause of type (28) with a clause of type (24) produces a clause of type (26); and 

• resolving a clause of type (28) with a clause of type (22) or (23) produces a clause of type (28). 

Hustadt et al. [2007] then show that the disjunctive program DD(T) can be obtained as the set of all clauses after saturation of 
type (22)-(24) and (26). For the case when T contains atoms of the form 3_R.self, program DD(T) should also include clauses 
of type (27) and (28), and the proof by Hustadt et al. [2007] applies without any problems. Furthermore, it is straightforward 
to verify that DD(T) is a nearly-monadic program. 

Finally, if 7" is a DL-Lite^^'^; -TBox, the only difference is that, in each clause of type (24), we have either j4 — TandC — _L, 
or _B = T and D = L. Since saturation does not introduce clauses of type (24), program DD(T) is clearly simple. D 

D Proofs for Section 4.3 

Theorem 12. Let M be a set of clauses, let A be an ABox, let C be a Horn clause such that J\f U A^ C, and assume that 
Procedure 1 is applied to f\f. Then, after some finite number of iterations of the loop in lines 3-9, we have Afn U A \= C. 

Proof. To prove our claim, we assume that Procedure 1 is applied to S — M U A. Towards this goal, we associate with each 
clause C G Sh U Sj^ a set of facts Fq', for each such Fc, let -^Fq = Vagf ^^- ^^ define Fc inductively on the applications 
of inference rules in Procedure 1 ; furthermore, we show in parallel that, at any point in time, for each clause C G Sh U Sjj and 
the corresponding set Fc, the following properties are satisfied: 



(a) TV h ^Fc V C, and 

(b) Fc C A. 

For the base case, consider an arbitrary clause C e 5. If C G J\f, we define Fc = 0; otherwise, we have C £ A\ TV, so C 
is a fact, and we define Fc — {C}. Properties (a) and (b) are clearly satisfied. 

For the induction step, assume that the two properties are satisfied for each clause C G Sh U S-j^ at some point in time. We 
consider the following two ways in which Procedure 1 can extend Sh or Sjj. 

• Assume that resolution is applied to clauses Ci = I?i V Ai and C2 = 152 V -1A2, deriving clause C = Dicr V Z?20'. Let 
Fc — Fc\ U Fc2, so property (b) is clearly satisfied. By induction assumption, we have TV |= ^Fci V Di V Ai and 
TV \= -^Fc2 V D2 V ^A2- By the soundness of binary resolution, we have {Di V Ai, D2 V ^^2} |= -Dicr V -D20'- But 
then, since -^Fc^ and ^F^^ contain only constants, we have TV ^ ^-Fci V -'-Fb2 ^ -Dicr V -020"^ as required for (a). 

• Assume that positive factoring is applied to a clause Ci = I?i V ^1 V Bi, deriving clause C = Dicr V Aia. Let 
Fc — Fci, so property (b) is clearly satisfied. By induction assumption, we have TV \= ^Fci V Di V Ai V Si. By the 
soundness of positive factoring, we have {Di \/ Ai \/ Bi] \^ Dia V Aia. But then, since ^Fc^ contains only constants, 
we have TV ^ ^-Fbi V Z?icr V Aicr, as required for (a). 

We now show the main claim of this theorem. To this end, consider an arbitrary Horn clause C such that TV U ^ ^ C. By 
Theorem 1 1, at some point in time during the application of Procedure 1 to S, we have Sh \= C. Note that Sh is a finite set. 

Consider an arbitrary Horn clause D G Sh- By property (a), we have TV |= -^Fd V D. Furthermore, -^Fd V I? is a Horn 
clause, so by Theorem 1 1, at some point in time time during the application of Procedure 1 to TV, we have J\f^ \= ^Fd V D. 
Finally, by property (b), we have Fd C A. These observations now imply that TV^ iJ A\= D. 

Now let M'jj ~ Udgs -^ff' clearly, TV^ U ^ |= Sh- Note that Procedure 1 is monotonic in the sense that, if Mh \= E 
at some point in time for some clause E, then this also holds at all future points in time. Furthermore, TV^ is finite, so at 
some point in time during the application of Procedure 1 to TV, we have Mh \= Mh- ^Y '^^e observations from the previous 
paragraph, we then have Mh U ^ |= Sh as well, which implies Mh U ^ 1= C, as required. D 

Lemma 13. Let V be a nearly-monadic program, and assume that Procedure 1 terminates when applied to V and returns Vh- 
Then, Vh is « nearly-monadic datalog program. 

Proof. We prove by induction on the application of the inference rules in Procedure 1 that, at any point in time, Vh U Vjy is 
a nearly-monadic program. The base case is clearly satisfied since V is nearly-monadic. For the induction base, we consider 
the possible inferences that can derive a clause in Vh U Vj^. First, note that positive factoring is never applicable to a clause 
of type 2 from Definition 7; furthermore, when applied to a clause of type 1, positive factoring always produces a clause of the 
same type. Second, since clauses of type 2 are Horn, binary resolution can be applied only if at least one clause is of type 1, 
and the resolvent is then clearly of type 1 as well. D 

Theorem 14. Let V = DD(r27-) for T an ST-LX-TBox. If, when applied to V, Procedure 1 terminates and returns Vh, then 
Vh U S7- is a rewriting ofT. 

Proof. Consider an arbitrary ABox A and an arbitrary fact a. By Theorem 6, we have that T U ^ ^ a if and only if 
Vtj- U 'E.q-{A) ^ a. By Theorem 8, the latter holds if and only if DD{Q,q-) U 'E,x{A) ^ a. Moreover, since a is a Horn 
clause, by Theorem 12, the latter holds if and only if Vh U 'E.q-{A) [= a. We now show that the latter holds if and only if 
Vh U S7- U ^ 1= a. Clearly, Vh U 'E.-y-{A) ^ a implies Vh U S7- U ^ |= a by monotonicity of first-order logic, so we next 
focus on showing that Vh U 'E.-t-{A) ^ a implies Vh U S7- U ^ |?t a. 

By Theorem 8, Lemma 13, and the fact that Procedure 1 is sound, program Vh is nearly-monadic and flj- |= Vh- Now 
let V^ and V^ be the subsets of Vh of the rules of type 1 and 2, respectively. Since ilj- \= V^, by the definition of S7- 
we have Ej- |= Vh- Furthermore, if a role atom occurs in the head of a rule in Vh, the atom is of the form R{z, z); hence, 
each fact involving a role atom in Vh(^t{-^)) \ '^t{-^) is necessarily of the form R{c, c). But then, such facts clearly cannot 
trigger a transitivity rule in Ej- to derive a new fact; furthermore, for each rule r G S7- of the form R{x, y) -^ S(x, y) or 
R{x, y) — > S(y, x), we have Vh h= r; consequently, 'E.'j-{Vh{'^t{-^))) = 'Ph{^t{^)), and the property holds. 

Thus, 7" U ^ 1= a if and only if Vh U S7- U .4 ^ a for arbitrary fact a; but then, for an arbitrary ground query Q, we also 
have T U ^ h Q if and only if T'// U Sr U ^ ^ Q, as required. D 

E Proofs for Section 4.4 

Lemma 19. Theorems 11 and 12 hold if Procedure 1 is modified so that, after line 5, C is replaced with its condensation. 

Proof. Assume that Procedure 1 derives a clause C in line 5, and let D be the condensation of C. Since Procedure 1 is sound, 
we have Sh U Sjj |= C; furthermore, since C subsumes D, we have {C} |= D; but then, we have Sh U Sjj ^ D as well. It 
is therefore safe to add D to Sh or Sjj, so let us assume that Procedure 1 does so; but then, this makes C redundant since D 
subsumes C by the definition of condensation. D 



Lemma 20. If used with condensation. Procedure 1 terminates when applied to a simple nearly-monadic program V. 

Proof. Let V = V™ U V^. Since V is simple, each rule in P"^ is of the form (29) with each variable yi occurring at most once 
in the rule, and each rule in P^ is of the form (30) or (31). 

/\A{x) A /\R,{x,x) A /\S^{x,yi) A /\T,{y,,x) ^y U^{x,x)\/\/ B,{x) (29) 

Rix,y) ^ S{x,y) (30) 

R{x,y)^S{y,x) (31) 

It is now straightforward to check that Procedure 1 derives only rules of such form: positive factoring is never applicable to a 
rule of the form (29)-(3 1), and binary resolution clearly derives only rules of these forms. 

Now let C be an arbitrary rule derived in line 5 of Procedure 1, and let D be the condensation of C; furthermore, let n be 
the number of binary atoms occurring in P. Since each variable in C occurs at most once in the rule, there can be at most 2n 
atoms of the form R(x, yi) or R{yi, x) different up to variable renaming; therefore, D contains at most 2n variables yi. Since 
the number of predicates in D is linear in the size of P, the size of each clause is linear in the size of P as well. But then, there 
can be at most exponentially many different clauses in Ph U P-jj-, which implies termination of Procedure 1 using the standard 
argument [Hustadt et al., 2007]. D 

F Proofs for Section 5 

We first present a well-known characterisation of the entailment of a datalog rule from a first-order theory. The proof of 
Proposition 23 is straightforward and can be found, for example, in the work by Cuenca Grau et al. [2012]. 

Proposition 23. Let T be a set of first-order sentences, and let r be a datalog rule of the form C\ A . . . A Cn — > H. Then, for 
each substitution a mapping each variable in r to a distinct individual not occurring in J- or r, we have T \= r if and only if 

J^U{c7{Ci),...,aiCn)}^c7{H). (32) 

We are now ready to prove Theorem 22. 

Theorem 22. The £CU-TBox 7" corresponding to the program P from Example 16 and the ground CQ Q = G{xi) are Q- 
rewritable, but not strongly Q-rewritable. 

Proof. Let Q = G(xi) be a ground query, and let T be the £jCU-TBox corresponding to the program P from Example 16; 
thus, T consists of axioms (33)-(35), which are translated into disjunctive rules as shown below. 

T \ZGUB -^ T ^ G{x) V B{x) (33) 

3E.G\ZB -^ E{xi,xo)AG{xo)^B{xi) (34) 

3E.B\ZG -^ E{xi,xo)AB{xo)^G{xi) (35) 

An individual v is reachable from an individual w by a path of length n in an ABox A if individuals u„, m„_i , . . . , Uo exist 
such that E{ui, Ui^i) £ A for each 1 < J < n, u„ = w, and uq = w. In this proof, we consider to be an even number. We 
next prove the following property (*), which characterises the answers to Q on T U ^: 

For each ABox A containing only the E predicate and for each individual v, we have v E cert{Q, T, A) iff an 
individual w exists such that v is reachable from w by a path of positive even length and a path of positive odd length. 

(Proof of *, direction ^) Let v and w be arbitrary individuals such that v is reachable from w by a path of even length and a 
path of odd length; thus, A contains sets of assertions of the following form, where fc is a positive even number, i* is a positive 
odd number, Uk — u'^ — v, and -^0 = ^0= ''^• 

{E{uk,Uk^i),...,E{ui,uo)} C A (36) 

{Eiu'„u',_,),...,Eiu[,u'o)}CA (37) 

Let / be an arbitrary model of T U ^. Due to axiom (33), we have the following two possibilities. 

• Assume that w E G' . Then, axioms (34) and (35) and the assertions in (36) ensure that Uj G G^ for each even number 
< j < k and Ui G B^ for each odd number 1 < i < fc — 1; thus, we have v E G^ . Furthermore, axioms (34) and (35) 
and the assertions in (37) ensure that u'^ G G^ for each even number < i < £ — 1 and u' G B^ for each odd number 

^ < j < £', thus, we have v E B^ . Consequently, we have v E B^ D G^ . 

• Assume that w G -B^. By a symmetric argument we also conclude that v E B^ Ci G^ . 

Thus, we have v E B^ D G^ for an arbitrary model / of T U ^, so f G cert{Q, T, A), as desired. 

(Proof of *, direction ^) Assume that v E cert{Q, T, .4); furthermore, for the sake of contradiction assume that, for each 
individual w occurring in A, each path from w to i" in ^ is of odd length, or each path from w to w in ^ is of even length. Let 
/ be the interpretation defined as follows: 



• A^ contains all individuals in A; 

• B^ ^ {w \ each path from w to u in ^ is of even length} U {v}; 

• G^ = {w \ each path from w to u in ^ is of odd length}; and 

• E' = {{c,d) \E{c,d) eA}. 

If there is no path from an individual w to individual v in A, then each path from w to f in ^ is (vacuously) of both even and 
odd length, so w E B^ O G^ ; hence, axioms (33)-(35) are satisfied for such w. Furthermore, if wi is an individual such that 
each path from wi to i; in ^ is of even length, and if W2 satisfies the same property, then each path from wi to W2 is also of 
even length; hence, axioms (33)-(35) are satisfied for such wi and W2- Finally, if wi is an individual such that each path from 
wi to w in ^ is of odd length, and if W2 satisfies the same property, then each path from wi to W2 is also of even length; hence, 
axioms (33)-(35) are satisfied for such wi and W2- Thus, have have I \= T D A; however, v ^ G^ , which is a contradiction. 

This completes the proof of property (*). Now let V be the following datalog program, where odd and even are fresh binary 
predicates: 

E{xi,xo) -^ odd{xi,xo) (38) 

odd{x2,xi) A E{xi,xo) — > even{x2,Xo) (39) 

even{x2,xi) A E{xi,xo) -^ odd{x2,xo) (40) 

odd{x,y) A even{x,y) -^ G{x) (41) 

^(xi,xo)AG(xo)^B(xi) (42) 

E{xuXo)AB{xo)^G{xi) (43) 

Furthermore, let A be an arbitrary ABox, and let A' be the subset of A containing precisely the assertions involving the E 
predicate. Due to rules (38)-(41), for each individual v we have V U A' \= G{v) iff an individual w exists such that v is 
reachable from w in A' via an even and an odd path; by property (*), the latter is the case iff T U A' \=^ G{v). Rules (42) and 
(43) correspond to axioms (34) and (35), and they merely 'propagate' G and B from individuals explicitly labelled with G and 
B in A; hence, it should be clear that T' is a Q-rewriting of T. Note, however, that V is not a strong Q-rewriting of T: it 
contains fresh predicates odd and even, soT ^V. 

To complete the proof, we next show that no strong Q-rewriting of T exists. To this end, let TZ be the infinite set containing 
rule (44) instantiated for each positive even number n. 

E{Xn,Xa) A E{Xn,Xn-l) A . . . A E{xi,Xq) -^G{xn) (44) 

It is straightforward to see that T \^ TZ'. one can derive all such rules using resolution and factoring as shown in Example 16. 
We next prove that TZ satisfies the following two properties, which immediately imply the claim of this the theorem. 

1 . V \= TZ for each strong Q-rewriting T" of T. 

2. For each finite set of datalog rules V' such that T H ^'' we have V Y= ^• 

(Property 1) Assume by contradiction that a strong Q-rewriting V' of T exists such that V' ^ TZ; then, there exist a rule 
r € TZ such that T" ^ r. Let Ci, . . . , C„ be the body atoms of r, and note that the head atom of r is Q = G(x„). Since r 
is a datalog rule and P' is a set of first-order formulas, by Proposition 23, for each substitution a mapping each variable in 
r to a distinct individual, we have T^' U {(t(Ci), . . . cr(C„)} ^ cr(Q). Now let <t be one such arbitrarily chosen substitution, 
and let A = {cr{Gi), . . . (j{Gn)}; clearly, we have T^' UA^ cr{Q). In contrast, TZ\J A'^ cr{Q), and, due to T |= ^' we have 
T U A^ cr{Q)- Thus, T'' is not a strong Q-rewriting of T, which contradicts our assumption. 

(Property 2) Let "P' be an arbitrary finite set of datalog rules such that T |= T'' , let m be the maximal number of body atoms 
in a rule in T" , let n be the smallest even number such that n > m, and let A be the following ABox where each Vi is distinct: 

A^ {E{Vn,Vo),E{Vn,Vn-l),E{Vn-l,Vn-2),- ■ ■ ,E{vi,Vo)} (45) 

We next show that, for each fact a, we have T" IJ A ^ a iff a E A; this clearly implies V' L) AY= G(w„), which by Propo- 
sition 23 implies T^' Y= T^, as required for Property 2. We proceed by contradiction, so assume that a fact a exists such that 
T^' L) A\=^ a and a ^ A. Then, a rule r G 7^' of the form r = Ci A . . . A Cfc — > iJ and a substitution a exist such that, for 
A' = {a-(Ci), . . . , (T{Gk)}, we have A' C A, a = a{H), and a (^ A; note that TZuA'\=a. We now make the following 
observations. 

• Since T h '^'^ we have TUA' \^ a. 

• Since k < m < n,we have A' C A. 

• Let r' e T" be an arbitrary non-tautological rule of the form r' = C( A . . . A C^ ^ H'. Since T |= P', we have T |= r' . 
The latter, however, is possible only if H' is a unary atom involving the G or the B predicate, and each G^ is an atom 
involving the G, B, or E predicate. Thus, either a = B{vi) or a = G{vi) for some integer i. 



• By property (*), we have T yj A! ^ G{vj) and TU ^' ^ B{vj) for each n > j > since each such individual Vj is 
reachable from other individuals in A' by at most one path. Thus, we have i = nin the previous item. 

• Individual w„ is reachable from vq via two paths in A; furthermore, due to A' C A, individual w„ is reachable from vq in 
A' via at most one path. Therefore, by property (*), we have TU A' ^ G{vn) and TU A' ^ B{vn). 

The above four points are clearly in contradiction, which completes our proof. D 



